Fast Computation of the Tree Edit Distance between Unordered Trees Using IP Solvers
نویسندگان
چکیده
We propose a new method for computing the tree edit distance between two unordered trees by problem encoding. Our method transforms an instance of the computation into an instance of some IP problems and solves it by an efficient IP solver. The tree edit distance is defined as the minimum cost of a sequence of edit operations (either substitution, deletion, or insertion) to transform a tree into another one. Although its time complexity is NP-hard, some encoding techniques have been proposed for computational efficiency. An example is an encoding method using the clique problem. As a new encoding method, we propose to use IP solvers and provide new IP formulations representing the problem of finding the minimum cost mapping between two unordered trees, where the minimum cost exactly coincides with the tree edit distance. There are IP solvers other than that for the clique problem and our method can efficiently compute ariations of the tree edit distance by adding additional constraints. Our experimental results with Glycan datasets and the Web log datasets CSLOGS show that our method is much faster than an existing method if input trees have a large degree. We also show that two variations of the tree edit distance could be computed efficiently by IP solvers.
منابع مشابه
A Clique-Based Method Using Dynamic Programming for Computing Edit Distance Between Unordered Trees
Many kinds of tree-structured data, such as RNA secondary structures, have become available due to the progress of techniques in the field of molecular biology. To analyze the tree-structured data, various measures for computing the similarity between them have been developed and applied. Among them, tree edit distance is one of the most widely used measures. However, the tree edit distance pro...
متن کاملNED: An Inter-Graph Node Metric Based On Edit Distance
Node similarity is a fundamental problem in graph analytics. However, node similarity between nodes in different graphs (inter-graph nodes) has not received a lot of attention yet. The inter-graph node similarity is important in learning a new graph based on the knowledge of an existing graph (transfer learning on graphs) and has applications in biological, communication, and social networks. I...
متن کاملDesigning an A* Algorithm for Calculating Edit Distance between Rooted-Unordered Trees
Tree structures are useful for describing and analyzing biological objects and processes. Consequently, there is a need to design metrics and algorithms to compare trees. A natural comparison metric is the "Tree Edit Distance," the number of simple edit (insert/delete) operations needed to transform one tree into the other. Rooted-ordered trees, where the order between the siblings is significa...
متن کاملComplexity of Computing Distances between Geometric Trees
Geometric trees can be formalized as unordered combinatorial trees whose edges are endowed with geometric information. Examples are skeleta of shapes from images; anatomical tree-structures such as blood vessels; or phylogenetic trees. An inter-tree distance measure is a basic prerequisite for many pattern recognition and machine learning methods to work on anatomical, phylogenetic or skeletal ...
متن کاملA Polynomial-Time Metric for Attributed Trees
We address the problem of comparing attributed trees and propose a novel distance measure centered around the notion of a maximal similarity common subtree. The proposed measure is general and defined on trees endowed with either symbolic or continuous-valued attributes, and can be equally applied to ordered and unordered, rooted and unrooted trees. We prove that our measure satisfies the metri...
متن کامل